Credit Risk Prediction Using Ensemble Learning

Python Machine Learning Ensemble Learning Voting Classifier Classification Streamlit

Project Overview

Predicting loan default is crucial for financial institutions to minimize risk and optimize lending decisions. This project utilizes machine learning techniques to assess a borrower's likelihood of default based on key attributes. By integrating ensemble learning methods, including Random Forest, Logistic Regression, and a Voting Classifier, the model enhances predictive accuracy and provides data-driven insights.

Key Insights

Missing values in the loan_int_rate column were imputed using the median for data consistency.
Implemented Random Forest and Logistic Regression models individually to evaluate performance.
Utilized a Voting Classifier to combine both models, achieving a 91% accuracy while balancing precision and recall.
Developed a Streamlit-based web app for interactive loan risk assessment.

Technical Implementation

Dataset Features:
- Borrower Attributes: Age, Income, Home Ownership, Credit History
- Loan Attributes: Amount, Interest Rate, Loan Intent, Loan Grade
- Risk Indicators: Loan Status (Default or Non-default), Credit History Length
Modeling Techniques:
- Random Forest: Captures complex patterns but may overfit.
- Logistic Regression: Provides linear interpretability, preventing overfitting.
- Voting Classifier: Combines both models to enhance accuracy and balance false positives/negatives.
Evaluation Metrics:
- Voting Classifier Accuracy: 91%
- Precision: Effective in identifying safe loans.
- Recall: Minimizes false negatives, reducing the risk of rejecting safe borrowers.
Web Deployment: Integrated a Streamlit-based web application to allow users to enter borrower details and receive loan default risk predictions.

Live Preview

Loading preview...

Video Preview

Key Learnings

Handling missing data effectively ensures model reliability.
Ensemble learning improves model performance by leveraging strengths from different classifiers.
Balancing precision and recall is crucial for financial applications to minimize both false approvals (risky loans granted) and false rejections (safe loans denied).
Interactive web apps (Streamlit) enhance user accessibility and model usability.

View Live GitHub